Search CORE

30 research outputs found

End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

Author: Cernocky Jan
Saraclar Murat
Yusuf Bolaji
Publication venue
Publication date: 15/08/2023
Field of study

Conventional keyword search systems operate on automatic speech recognition (ASR) outputs, which causes them to have a complex indexing and search pipeline. This has led to interest in ASR-free approaches to simplify the search procedure. We recently proposed a neural ASR-free keyword search model which achieves competitive performance while maintaining an efficient and simplified pipeline, where queries and documents are encoded with a pair of recurrent neural network encoders and the encodings are combined with a dot-product. In this article, we extend this work with multilingual pretraining and detailed analysis of the model. Our experiments show that the proposed multilingual training significantly improves the model performance and that despite not matching a strong ASR-based conventional keyword search system for short queries and queries comprising in-vocabulary words, the proposed model outperforms the ASR-based system for long queries and queries that do not appear in the training data.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 202

arXiv.org e-Print Archive

Quantification de séquences spectrales de longueurs variables pour le codage de la parole à très bas débit

Author: BAUDOIN Geneviève
CERNOCKY Jan
CHOLLET Gérard
Publication venue: GRETSI, Groupe d’Etudes du Traitement du Signal et des Images
Publication date: 01/01/1997
Field of study

Ce papier traite du codage des paramètres spectraux pour le codage de parole à très bas débit. Nous présentons une nouvelle interprétation de recherches précédemment publiées par Chou-Lockabaugh et Cemocky-Baudoin-Chollet sur la quantification de séquences spectrales de longueurs variables, sous les noms respectifs de « Variable to Variable length Vector Quantization » (VVVQ) et de quantification par multigrammes (MGQ). Nous avons, d'autre part étudié l'influence de la limitation du retard introduit par la méthode et proposé une technique pour optimiser les performances en présence d'un retard maximum imposé. Nous avons ainsi trouvé qu'un retard de 400 ms est généralement suffisant. Enfin, nous proposons l'introduction de longues séquences dans le dictionnaire par interpolation linéaire des séquences courtes

I-Revues

An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification

Author: Burget Lukas
Cernocky Jan
Mosner Ladislav
Peng Junyi
Plchot Oldrich
Stafylakis Themos
Publication venue
Publication date: 03/10/2022
Field of study

In recent years, self-supervised learning paradigm has received extensive attention due to its great success in various down-stream tasks. However, the fine-tuning strategies for adapting those pre-trained models to speaker verification task have yet to be fully explored. In this paper, we analyze several feature extraction approaches built on top of a pre-trained model, as well as regularization and learning rate schedule to stabilize the fine-tuning process and further boost performance: multi-head factorized attentive pooling is proposed to factorize the comparison of speaker representations into multiple phonetic clusters. We regularize towards the parameters of the pre-trained model and we set different learning rates for each layer of the pre-trained model during fine-tuning. The experimental results show our method can significantly shorten the training time to 4 hours and achieve SOTA performance: 0.59%, 0.79% and 1.77% EER on Vox1-O, Vox1-E and Vox1-H, respectively.Comment: Accepted by SLT202

arXiv.org e-Print Archive

Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation

Author: Ahonen Timo
Cernocky Jan
Marcel Sébastien
Matejka Pavel
McCool Chris
Publication venue: rue Marconi 19, Idiap
Publication date: 26/08/2010
Field of study

This paper evaluates the performance of face and speaker verification techniques in the context of a mobile environment. The mobile environment was chosen as it provides a realistic and challenging test-bed for biometric person verification techniques to operate. For instance the audio environment is quite noisy and there is limited control over the illumination conditions and the pose of the subject for the video. To conduct this evaluation, a part of a database captured during the ``Mobile Biometry'' (MOBIO) European Project was used. In total there were nine participants to the evaluation who submitted a face verification system and five participants who submitted speaker verification systems. The nine face verification systems all varied significantly in terms of both verification algorithms and face detection algorithms. Several systems used the OpenCV face detector while the better systems used proprietary software for the task of face detection. This ended up making the evaluation of verification algorithms challenging. The five speaker verification systems were based on one of two paradigms: a Gaussian Mixture Model (GMM) or Support Vector Machine (SVM) paradigm. In general the systems based on the SVM paradigm performed better than those based on the GMM paradigm

Infoscience - École polytechnique fédérale de Lausanne

MOBIO: Mobile Biometric Face and Speaker Authentication

Author: Atanasoaei Cosmin
Cernocky Jan
Helistekangas Mika
Marcel Sébastien
Matejka Pavel
McCool Chris
Pesan Jan
Tarsetti Flavio
Turtinen Markus
Publication venue: rue Marconi 19, Idiap
Publication date: 26/08/2010
Field of study

This paper presents a mobile biometric person authentication demonstration system. It consists of verifying a user's claimed identity by biometric means and more particularly using their face and their voice simultaneously on a Nokia N900 mobile device with its built-in sensors (frontal video camera and microphone)

Infoscience - École polytechnique fédérale de Lausanne